Current Issue : January - March Volume : 2017 Issue Number : 1 Articles : 5 Articles
Substantial amounts of resources are usually required to robustly develop a language model for an open vocabulary\nspeech recognition system as out-of-vocabulary (OOV) words can hurt recognition accuracy. In this work, we applied\na hybrid lexicon of word and sub-word units to resolve the problem of OOV words in a resource-efficient way. As\nsub-lexical units can be combined to form new words, a compact set of hybrid vocabulary can be used while still\nmaintaining a low OOV rate. For Thai, a syllable-based unit called pseudo-morpheme (PM) was chosen as a sub-word\nunit. To also benefit from different levels of linguistic information embedded in different input types, a hybrid\nrecurrent neural network language model (RNNLM) framework is proposed. An RNNLM can model not only\ninformation from multiple-type input units through a hybrid input vector of words and PMs, but can also capture long\ncontext history through recurrent connections. Several hybrid input representations were also explored to optimize\nboth recognition accuracy and computational time. The hybrid LM has shown to be both resource-efficient and\nwell-performed on two Thai LVCSR tasks: broadcast news transcription and speech-to-speech translation. The\nproposed hybrid lexicon can constitute an open vocabulary for Thai LVCSR as it can greatly reduce the OOV rate to\nless than 1 % while using only 42 % of the vocabulary size of the word-based lexicon. In terms of recognition\nperformance, the best proposed hybrid RNNLM, which uses a mixed word-PM input, obtained 1.54 % relative WER\nreduction when compared with a conventional word-based RNNLM. In terms of computational time, the best hybrid\nRNNLM has the lowest training and decoding time among all RNNLMs including the word-based RNNLM. The overall\nrelative reduction on WER of the proposed hybrid RNNLM over a traditional n-gram model is 6.91 %....
A sound target-searching robot system which includes a 4-channel microphone array for\nsound collection, magneto-resistive sensor for declination measurement, and a wireless sensor\nnetworks (WSN) for exchanging information is described. It has an embedded sound signal\nenhancement, recognition and location method, and a sound searching strategy based on a digital\nsignal processor (DSP). As the wireless network nodes, three robots comprise the WSN a personal\ncomputer (PC) in order to search the three different sound targets in task-oriented collaboration.\nThe improved spectral subtraction method is used for noise reduction. As the feature of audio signal,\nMel-frequency cepstral coefficient (MFCC) is extracted. Based on the K-nearest neighbor classification\nmethod, we match the trained feature template to recognize sound signal type. This paper utilizes\nthe improved generalized cross correlation method to estimate time delay of arrival (TDOA), and\nthen employs spherical-interpolation for sound location according to the TDOA and the geometrical\nposition of the microphone array. A new mapping has been proposed to direct the motor to search\nsound targets flexibly. As the sink node, the PC receives and displays the result processed in the\nWSN, and it also has the ultimate power to make decision on the received results in order to improve\ntheir accuracy. The experiment results show that the designed three-robot system implements sound\ntarget searching function without collisions and performs well....
We present an algorithm for the estimation of fundamental frequencies in voiced audio signals. The method is\nbased on an autocorrelation of a signal with a segment of the same signal. During operation, frequency estimates are\ncalculated and the segment is updated whenever a period of the signal is detected. The fast estimation of\nfundamental frequencies with low error rate and simple implementation is interesting for real-time speech\nsignal processing....
A millimeter wave (MMW) radar sensor is employed in our laboratory to detect human\nspeech because it provides a new non-contact speech acquisition method that is suitable for various\napplications. However, the speech detected by the radar sensor is often degraded by combined noise.\nThis paper proposes a new perceptual wavelet packet method that is able to enhance the speech\nacquired using a 94 GHz MMW radar system by suppressing the noise. The process is as follows.\nFirst, the radar speech signal is decomposed using a perceptual wavelet packet. Then, an adaptive\nwavelet threshold and new modified thresholding function are employed to remove the noise from\nthe detected speech. The results obtained from the speech spectrograms, listening tests and objective\nevaluation show that the new method significantly improves the performance of the detected speech....
In this paper, we propose a robust voice activity detection (VAD) algorithm to effectively\ndistinguish speech from non-speech in various noisy environments. The proposed VAD utilizes\npower spectral deviation (PSD), using Teager energy (TE) to provide a better representation of the\nPSD, resulting in improved decision performance for speech segments. In addition, the TE-based\nlikelihood ratio and speech absence probability are derived in each frame to modify the PSD for\nfurther VAD.We evaluate the performance of the proposed VAD algorithm by objective testing in\nvarious environments and obtain better results that those attained by of the conventional methods....
Loading....